Picture for Shengju Qian

Shengju Qian

Policy and World Modeling Co-Training for Language Agents

Add code
Jun 01, 2026
Viaarxiv icon

On-Policy Adversarial Flow Distillation for Autoregressive Video Generation

Add code
May 25, 2026
Viaarxiv icon

SAP: Segment Any 4K Panorama

Add code
Mar 13, 2026
Viaarxiv icon

CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling

Add code
Mar 01, 2026
Viaarxiv icon

AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer

Add code
Feb 12, 2026
Viaarxiv icon

StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

Add code
May 26, 2025
Figure 1 for StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Figure 2 for StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Figure 3 for StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Figure 4 for StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Viaarxiv icon

MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation

Add code
Mar 27, 2025
Viaarxiv icon

Text-Animator: Controllable Visual Text Video Generation

Add code
Jun 25, 2024
Viaarxiv icon

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Add code
Apr 23, 2024
Figure 1 for ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Figure 2 for ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Figure 3 for ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Figure 4 for ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Figure 1 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 2 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 3 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 4 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Viaarxiv icon